Fusion of Loops for Parallelism and Locality

نویسندگان

  • Naraig Manjikian
  • Tarek S. Abdelrahman
چکیده

Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-carried dependences which reduce parallelism. In addition, performance losses result from cache conflicts in fused loops. We present new, systematic techniques which: (1) allow fusion of loop nests in the presence of fusion-preventing dependences, (2) allow parallel execution of fused loops with minimal synchronization, and (3) eliminate cache conflicts in fused loops. We evaluate our techniques on a 56-processor KSR2 multiprocessor, and show improvements of up to 20% for representative loop nest sequences. The results also indicate a performance tradeoff as more processors are used, suggesting careful evaluation of the profitability of fusion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximizing Loop

Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...

متن کامل

Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...

متن کامل

Polynomial-Time Nested Loop Fusion with Full Parallelism

Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependences exist in nested loops, or can not achieve good parallelism after fu...

متن کامل

With-Loop Fusion for Data Locality and Parallelism

Abstract. With-loops are versatile array comprehensions used in the functional array language SaC to implement universally applicable array operations. We describe the fusion of with-loops as a novel optimization technique to improve the data locality of compiled code. Experiments based on selected benchmark programs show the significance of withloop fusion for achieving competitive runtime per...

متن کامل

Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism

Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependencies exist in nested loops, or can not achieve good parallelism after f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995